Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test: Hooks Glue Exporter #6721

Merged
merged 12 commits into from
Oct 19, 2023
Merged

Test: Hooks Glue Exporter #6721

merged 12 commits into from
Oct 19, 2023

Conversation

Isan-Rivkin
Copy link
Contributor

@Isan-Rivkin Isan-Rivkin commented Oct 8, 2023

Closes #6686

  • This PR test AWS Exporters (symlink_exporter and glue_exporter).
  • The test runs in AWS context only.

Test flow

Symlink Exporter:

  1. Generate CSV Data, lua script with symlink exporter, table for _lakefs_tables and upload.
  2. Generate Action file that will trigger the script.
  3. Wait until the symlinks are created in S3.
  4. Verify that the symlink files on S3 are as expected.

Glue Exporter - Relies on previous step and will create a glue table based on data in that commit:

  1. Generate lua script with glue exporter, lakefs_action to trigger it and upload.
  2. Wait for Table to be created in Glue.
  3. Verify the table created correctly.
  4. Delete the table when done.

@Isan-Rivkin Isan-Rivkin added area/testing Improvements or additions to tests export-hooks labels Oct 8, 2023
@Isan-Rivkin Isan-Rivkin self-assigned this Oct 8, 2023
@Isan-Rivkin Isan-Rivkin changed the title 6691 lua glue export test WIP: Test: Hooks Glue Exporter Oct 8, 2023
@Isan-Rivkin Isan-Rivkin added the exclude-changelog PR description should not be included in next release changelog label Oct 18, 2023
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving as this is a significant improvement. But note that the condition for running the test is incorrect: we should look at whether we are testing AWS, not whether we are testing S3. Ideally we could fix this now, rather than carry tech debt for whenever we want to test on MinIO somehow.

TableSpec *hiveTableSpec
}

const glueExportScript = `
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document this please? Docs should say it's Lua (I think...), but also explain what {{...}} in it means.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative, Go has a statically-linked readonly filesystem available. If you used that you could put this in a file, statically link that file into the filesystem, and then the test could read from this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed all the templates / strings everything to files with go template!

exporter.export_glue(glue, args.catalog.db_name, args.table_source, args.catalog.table_input, action, {debug=true})
`

const glueExporterAction = `
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too. AFAICT this is YAML.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed all the templates / strings everything to files with go template!


func setupGlueClient(ctx context.Context, accessKeyID, secretAccessKey string) (*glue.Client, error) {
cfg, err := config.LoadDefaultConfig(ctx,
config.WithRegion("us-east-1"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have an envariable with the region.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Made it configurable (its tied to the location of the db)

require.Equal(t, http.StatusCreated, commitResp.StatusCode())
return commitResp.JSON201
}
func genCSVData(cols []string, n int) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func genCSVData(cols []string, n int) string {
func genCSVData(cols []string, n int) string {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please consider using the standard encoding/csv package. Encoding CSVs like this is incredibly brittle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% done!

func testSymlinkS3Exporter(t *testing.T, ctx context.Context, repo string, tablePaths map[string]string, testData *exportHooksTestData) (string, string) {
t.Helper()

tableYaml, err := yaml.Marshal(&testData.TableSpec)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just store testData.TableSpec as an interface{}, or possibly a map[string]interface{}, and avoid having to write YAML in code, and then deserialize it back into a map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No can do. Because that struct is not only used for YAML but also as configuration parameters during the test.
Therefore, .I would still have to maintain variables for all of those, why not just map 1 to 1 with the YAML anyway.


// upload action
commit := uploadAndCommitObjects(t, ctx, repo, mainBranch, map[string]string{
"_lakefs_actions/animals_symlink.yaml": renderTplAsStr(t, testData, "action", symlinkExporterAction),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this upload need to be separate from the one on l.196?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't you're right consolidated everything into one upload, thanks!

Comment on lines +225 to +227
bo := backoff.NewExponentialBackOff()
bo.MaxInterval = 5 * time.Second
bo.MaxElapsedTime = 30 * time.Second
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that ExponentialBackoff is an exported type, why not just

bo := &backoff.ExponentialBackoff{
	MaxInterval:    5 * time.Second,
	MaxElapsedTime: 30 * time.Second,
}

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it has other variables inside that New func that we want to use as defaults and are not set when calling backoff.ExponentialBackoff{}

})
require.NoErrorf(t, err, "listing symlink files in storage: %s", symlinksPrefix)
if len(listResp.Contents) == 0 {
return fmt.Errorf("no objects found")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer Errors.new perhaps?
But I am not sure that returning this as a specific error gives more information than just returning the empty list -- which will also output the expected results.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i want the retry to keep running if there are no objects found, that's why I'm returning an error.
Per the message itself changed to errors.new

// Symlinks export: lua script, table in _lakefs_tables, action file, mock table data in CSV form
// Glue export: lua script, table in _lakefs_tables, action file
func TestAWSCatalogExport(t *testing.T) {
requireBlockstoreType(t, block.BlockstoreTypeS3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that this is the ideal thing to use. We are interested in running on AWS, not in running of S3. Suppose we were running on MinIO: presumably Glue would not work there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thanks!

@Isan-Rivkin Isan-Rivkin changed the title WIP: Test: Hooks Glue Exporter Test: Hooks Glue Exporter Oct 19, 2023
@Isan-Rivkin Isan-Rivkin merged commit 7d9feeb into master Oct 19, 2023
29 checks passed
@Isan-Rivkin Isan-Rivkin deleted the 6691-lua-glue-export-test branch October 19, 2023 12:53
@itaiad200
Copy link
Contributor

Important test! Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/testing Improvements or additions to tests exclude-changelog PR description should not be included in next release changelog export-hooks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Test: Hooks Glue Exporter
3 participants